A data-driven method for modeling pronunciation variation

نویسندگان

  • Judith M. Kessens
  • Catia Cucchiarini
  • Helmer Strik
چکیده

This paper describes a rule-based data-driven (DD) method to model pronunciation variation in automatic speech recognition (ASR). The DD method consists of the following steps. First, the possible pronunciation variants are generated by making each phone in the canonical transcription of the word optional. Next, forced recognition is performed in order to determine which variant best matches the acoustic signal. Finally, the rules are derived by aligning the best matching variant with the canonical transcription of the variant. Error analysis is performed in order to gain insight into the process of pronunciation modeling. This analysis shows that although modeling pronunciation variation brings about improvements, deteriorations are also introduced. A strong correlation is found between the number of improvements and deteriorations per rule. This result indicates that it is not possible to improve ASR performance by excluding the rules that cause deteriorations, because these rules also produce a considerable number of improvements. Finally, we compare three different criteria for rule selection. This comparison indicates that the absolute frequency of rule application (Fabs) is the most suitable criterion for rule selection. For the best testing condition, a statistically significant reduction in word error rate (WER) of 1.4% absolutely, or 8% relatively, is found. 2002 Elsevier Science B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Modeling of Pronunciation Variation by Hierarchical Grouping Rule Inference

In this paper, a data-driven approach to statistical modeling pronunciation variation is proposed. It consists of learning stochastic pronunciation rules. The proposed method jointly models different rules that define the same transformation. Hierarchic Grouping Rule Inference (HIEGRI) algorithm is proposed to generate this model based on graphs. HIEGRI algorithm detects the common patterns of ...

متن کامل

Modeling Pronunciation Variation for Asr: Comparing Criteria for Rule Selection

In this paper we use a data-driven (DD) rule-based method for modeling pronunciation variation. Error analysis is performed in order to gain insight into the effect of pronunciation variation modeling. This analysis shows that although modeling pronunciation variation brings about improvements, deteriorations are also introduced. A strong correlation is found between the number of improvements ...

متن کامل

Data-driven Pronunciation Modeling for AS

We describe a method to model pronunciation variation for ASR in a data-driven way, namely by use of automatically derived acoustic subword units. The inventory of units is designed so as to produce maximal separable pronunciation variants of words while at the same time only the most important variants for the particular application are trained. In doing so, the optimal number of variants per ...

متن کامل

Data-driven pronunciation modeling for ASR using acoustic subword units

We describe a method to model pronunciation variation for ASR in a data-driven way, namely by use of automatically derived acoustic subword units. The inventory of units is designed so as to produce maximal separable pronunciation variants of words while at the same time only the most important variants for the particular application are trained. In doing so, the optimal number of variants per ...

متن کامل

Accent-specific Mandarin adaptation based on pronunciation modeling technology

An accent adaptation approach using pronunciation variation modeling technology for Mandarin accent was proposed in this paper. As Chinese language is monosyllabic, the syllable pronunciation variation dictionary (SPVD) was built to depict the characteristics of accent. Firstly, the pronunciation modeling technology was utilized to get the context-independent and contextdependent accent-specifi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2003